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Abstract. We describe the astrophysical and numerical basis of A r -body simulations, both of collisional 
stellar systems (dense star clusters and galactic centres) and collisionless stellar dynamics (galaxies and 
large-scale structure). We explain and discuss the state-of-the-art algorithms used for these quite different 
regimes, attempt to give a fair critique, and point out possible directions of future improvement and 
development. We briefly touch upon the history of iV-body simulations and their most important results. 

PACS. 98.10.+Z Stellar dynamics and kinematics - 95.75.Pq Mathematical procedures and computer 
techniques - 02.60.Cb Numerical simulation; solution of equations 
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Cl, 1 Introduction 

2 Of the four fundamental forces, gravity is by far the weakest. Yet on large distances it dominates all other interactions 
. owing to the fact that it is always attractive. Most gravitational systems are well approximated by an ensemble of 
' point masses moving under their mutual gravitational attraction and range from planetary systems (such as our own) 
l— — ', to star clusters, galaxies, galaxy clusters and the universe as a whole. 

I ■ Systems dominated by long-range forces such as gravity are not well treatable by statistical mechanical methods: 
| energy is not extensive, the canonical and micro-canonical ensembles do not exist, and heat capacity is negative 
{SJ ■ Moreover, gravitational encounters are inefficient for re-distributing kinetic energy, such that many such encounters 
00 ' are required for relaxation, i.e. equipartition of kinetic energy. Gravitational systems, where this process is potentially 
. important over their lifetime are called 'collisional' as opposed to 'collisionless' stellar system^]. The timescale over 
• which this so-called 'two-body relaxation' is important is roughly [l[ 

in '. 

o ■ N 

irclax ~ 81n/l * dyn ! ' ' 



X 



where N is the number of particles; idyn is the dynamical tim^3; \nA = ln(6 max /&min) is called the Coulomb Logarithm; 
and &max and 6 m i n are the maximum and minimum impact parameters for the system, respectively. Collisional systems 
usually have a high dynamic age (idyn short compared to their lifetime) and high density, and include globular star 



?H ■ clusters and galactic centres. The majority of stellar systems, however, are collisionless. 

The numerical simulation of iV-body systems has a long history, dating back to the very first light-bulb experiments 
of Holmberg Q in 1941. The first computer calculations were performed by von Hoerner Q in 1960 who reached 
N = 16, paving the way for the pioneering work of Aarseth [1| in 1963 with N = 100. Since these early works, N has 
nearly doubled every two years in accordance with Moore's law (see Fig.Q]). The latest collisional iV-body calculations 



1 We shall use the term 'stellar system' for a system idealised to consist of gravitating point masses, whether these are stars, 
planets, dark-matter particles, or whatever. 

2 The dynamical or crossing time is a characteristic orbital time scale: the time required for a significant fraction of an orbit. 
A useful approximation, valid for orbits in inhomogeneous stellar systems, is td yn — (Gp)~ 1//2 with p the mean density interior 
to the particles current radius [l|. Note that the complete orbital period is longer by a factor ~ 3. 

3 The minimum impact parameter is well defined for a system containing equal mass point particles: 6 m i n = 2Gm/a 2 , where a 
is the velocity dispersion of the background particles @] ■ The maximum impact parameter is related in some way to the extent 
of the system, but is less well defined. Fortunately, even an order of magnitude uncertainty in & max has only a small effect since 
it appears inside a logarithm. Notice that the form of the Coulomb Logarithm has a profound implication: each octave of b 
contributes equally to \nA such that relaxation is driven primarily by weak long-range interactions with b 3> &min- 
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have reached over 10 6 particles @, while collisionless calculations can now reach more than 10 9 particles 0413 • This 
disparity reflects the difference in complexity of these rather dissimilar iV-body problems. The significant increase in 
N in the last decade was driven by the usage of parallel computers. 

In this review, we discuss the state-of-the art software algo- 
rithms and hardware improvements that have driven this dra- 
matic increase in N. We consider the very different challenges 
posed by collisional (§2J versus collisionless (Sj3|) systems, and 
we attempt to give a fair critique of the methods employed, 
pointing out where there is room for improvement, and dis- 
cussing some interesting future research directions. Our focus 
is primarily on gravity; we do not consider the important role 
that the other fundamental forces plajQ- Since our goal is to 
elucidate the numerics, we will only touch upon the many in- 
teresting and important results that have come out of A^-body 
modelling over the past 50 years. We must therefore apologise 
in advance for all that is missed out in this brief review. We do 
not have space to discuss modelling gas physics and its many 
complications. Nor will we discuss the art of setting up initial 
conditions for TV-body simulations. 

There are already several iV-body reviews and books in the 
literature. Aarseth [25| and Heggie & Hut (2(| give excellent 
reviews of the A-body problem, focusing mainly on collisional 
Af-body simulations. Hockney & Eastwood [27| cover many 
aspects of particle-based simulations, focusing on collisionless 
applications. Trenti & Hut [28| give a general overview of the 
A^-body problem, covering both the collisional and collision- 
less regimes. Our review takes a somewhat different approach 
to these previous works. We attempt to review numerical tech- 
niques for both collisional and collisionless Af-body simula- 
tions, focusing on the very latest techniques, and with a view to assessing interesting future research directions. As 
much as possible, we present the key equations and describe the methodology at a level where we hope the reader will 
be able to obtain a relatively deep understanding of the algorithms and their hidden gremlins. 

This paper is organised as follows. In ^ we review numerical methods for collisional A^-body simulations: the 
astrophysical ( fl2.ll) and numerical foundations ( fl2.2p with special treatment of the time integration ( fl2.3p . recent 
hardware-driven developments ( fl2.4p : and give a critique of the current state of the art ( fl2.5p , summarise alternatives 
to A^-body methods ( fl2.6p . as well as present a (brief and biased) overview over past and recent astrophysical results 
with an outlook for the future ( fl2.7p . In SJJI we review numerical methods for collisionless simulations: the astrophysical 
( fl3.lt and numerical foundations ( A3. 21) . the basis of cosmological A^-body simulations ( fl3.3p , force softening ( fl3.4p and 
the various force solvers ( fl3.5p . recent developments and challenges ( fl3.6p . and a very brief overview of astrophysical 
results ( fl3.7p . <2] describes methods for validation of A^-body simulations, and in $3] we present our conclusions and 
outlook for the future of the held. 



2 N-body methods for collisional systems 

Collisional systems are dynamically old such that tdyn is short compared to their age. This applies mainly to massive 
star clusters, and for this reason we will mostly refer to the A^-body particles in this section as stars within a star 
cluster. Such clusters typically orbit deep within the potential of a host galaxy - like our own Milky Way - such that 
their dynamics is affected by the tidal field of their host. 

Over many dynamical times, the accumulated effect of many small encounters between stars significantly affects 
the evolution of collisional systems. Relaxation-driven equipartition of energy causes heavier stars to sink towards the 
centre, while low-mass stars are pushed to the outskirts, where they are susceptible to being skimmed off by Galactic 

4 Many stellar systems contain significant amounts of gas, which in addition to gravity also interact electromagnetically. This 
gives rise to a large number of complicated effects from radiative cooling and the formation of stars (inside of which the strong 
and weak interactions become important, too), to active galactic nuclei and outflows driven by radiative heating. While for 
most gravitational systems these non-gravitational effects play an important role only for brief periods of their lifetime, their 
understanding and appropriate modelling is at the forefront of many contemporary challenges in astrophysics. This is, however, 
beyond the scope of this paper. 
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Fig. 1. The increase in particle number over the past 
50 years for selected collisional (red, I ll |4ldl taken 
from [l^]) and collisionless (blue, fa. I7H9L Il8l424jp TV-body 
simulations. The line shows the scaling N = N 2 (yeaT - yo)/2 
expected from Moore's law if the costs scale oc N. 
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tides. Such relaxation processes provide a mechanism for transporting specific energy outwards, resulting in what is 
called the 'gravothermal catastrophe': a dramatic increase in the central density resulting in core collapse 29 32]. 

In addition to relaxation - which is largely driven by distant encounters - short-range interactions can form bound 
particle pairs. Such binaries are called 'hard' if their binding energy exceeds the typical kinetic energy of surrounding 
stars in the cluster. Close encounters with such field stars result in a further hardening of hard binaries, while soft 
binaries loose binding energy and become even softer (known as 'Heggie's law' [H, (HI). Hard binary interactions are 
of great importance, as they act as a source of kinetic energy |35j . heating the core of the cluster (where most binaries 
have sunk due to their larger mass), counter-acting the relaxation-driven energy flux from the core to the outskirts, and 
thus prolonging the time until core collapse (36l . |37| . Core collapse could perhaps even be reversed by binaries freshly 
formed in three-body encounters (or two-body encounters involving strong stellar tides), a process most likely to occur 
in the very high densities reached during core collapse. The formation and evolution of hard binaries (and higher order 
multiples of stars) presents a unique numerical challenge, because their short dynamical times and extreme forces lead 
to very small integration time steps [25[ ; they require special treatment that we discuss in £12.31 

Another reason binary interactions are important astrophysically is that they provide a route to forming close 
binary stars of a constitution and type unlikely to form under ordinary star-formation conditions. 'Blue Straggler' 
stars, for example, may form in this way [38l - l40| . as well as ultra compact X-ray binaries, which are over-abundant in 
globular clusters [41], and references therein]. 

2.1 Equations governing collisional stellar systems 

The physics of collisional stellar systems is in principle quite simple: the motion of N point masses under their mutual 
gravitational attraction. This is in fact a Hamiltonian system, with a Hamiltonian H and equations of motion 



where pi — rriiXi is the momentum; Xi is the position; and the acceleration of particle i. A-body simulations 

of collisional systems simply try to solve these equations directly by brute force. 

For problems involving massive central black holes, general relativistic (GR) corrections to the force can become 
important. This is because, although we are almost always in the weak field regime, if the black hole dominates the 
central potential, then the potential will be close to Keplerian and super-resonant. GR effects act to break resonance 
by inducing orbital precessions. Over many dynamical times, or if stars orbit very close to the central black hole, such 
effects can become important [U IHj]. We will not discuss such 'post-Newtonian' corrections to the force further in 
this review, but refer the interested reader to Aarseth [42| and references therein. 

2.2 Numerics of collisional N-body simulations 

While equations ([5]) are conceptually straightforward, they are anything but straightforward to solve numerically. One 
obvious problem is that the computational cost for the force calculation for all particles grows as 0(N 2 ), implying 
that realistic simulations with N ~ 10 6 (the number of stars in massive star clusters) are challenging. Such high N 
has only recently been achieved for a small number of simulations using special hardware chips operated in parallel 
@ (see also £12.41) . A second serious problem is the enormous range of time scales, ranging from days for the periods of 
tight binaries^ to millions of years for the orbits of most stars, and 10 10 years for the age of the whole stellar system. 
To make things worse, the time step required to accurately integrate the trajectory of any individual star can change 
considerably along its orbit and abruptly during close encounters. Given this range of formidable problems, it is not 
surprising that there are very few A-body codes which can cope with them. 

As already alluded to, the computation of the forces dominates the computational cost of existing collisional N- 
body codes, all of which use the brute-force direct-summation approach (i.e. a straightforward implementation of 
equations [5]) . This is motivated by the need for an accurate force computation to ensure correct modelling of both 
close and distant encounters. Currently, several approaches are applied to reduce the frequency of force computations 
for any individual star. First, the force is split into contributions from near neighbours and the far field. While the 
former varies on short time scales, it is quite cheap to compute, whereas the latter is expensive to evaluate but much 
smoother in time. Thus, splitting the force into two components like this allows us to reduce the need for the expensive 
computation of the far-field force. This 'Ahmad-Cohen' [451 ] scheme necessarily requires individual timesteps for each 
particle which we discuss in £12.31 

5 The shortest observed binary period is 11 minutes for the X-ray binary 4U 1820-30 in the globular cluster NGC6624 [44| . 
but such very short periods are not simulated in contemporary iV-body simulations. 
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Second, the frequency for computing the acceleration can be further reduced by computing its time derivative, 
the jerk: 

r , x ijXij ~ 3xij(xij ■ Xij) 
di = — G 2_^ m i — i [5 with Xy = Xi — Xj, (6) 



which is used to predict into the future or, equivalently, to employ a higher-order time integrator with a larger time 
step (see . Ever higher order schemes require ever higher derivatives of to obtain forward interpolations of the 
force 0. 



2.3 Time integration 

The accurate time integration of close encounters is the most difficult part of collisional TV-body methods, while 
for collisionless TV-body methods force softening (sec g3.4|) alleviates this problem substantially. Here, we review the 
various time integration methods employed in both types of TV-body methods. Let us begin our considerations by the 
simple 'Euler method', which updates the position and velocity for a given particle by timestep At via 

x(t + At) = x(t) + xAt (4a) 
x(t + At) = x{t) + a(t) At. (4b) 

While conceptually straightforward, this scheme performs very poorly in practice. The Euler method is just a Taylor 
expansion to first order in At and the errors are proportional to At 2 . We can significantly improve on this at little 
additional computational cost either by increasing the expansion order and thus the accuracy, or by integrating a 
'near-by' Hamiltonian exactly using a low-oder scheme. We now compare and contrast a popular example of each type 
of approach: the second-order leapfrog integrator, which is heavily used in collisionless TV-body applications, and the 
fourth-order Hermite scheme, which has become the integrator of choice for collisional applications. 



2.3.1 The Leapfrog integrator 



The leapfrog integrator is an example of a symplectic integrator. Symplectic integrators exactly solve an approximate 
Hamiltonian. As a consequence, the numerical time evolution is a canonical map and preserves certain conserved 
quantities exactly, such as the total angular momentum, the phase-space volume, and the Jacobi constants. The idea 
is to approximate the Hamiltonian H in equation ([2]) with 



H = H + H P 



(5) 



where Hg rr is the error Hamiltonian. Provided that H and H are time-invariant, the energy error is bounded at all 
times [47J3 The goal now is to find H that can be solved exactly by simple numerical means and minimises H evv . 
Defining the combined phase-space coordinates w = (x,p) we can re-write Hamilton's equations as: 



w = %w. 



(6) 



where T~L = {■, H} (with the Poisson bracket {^4, B} = d x A-d p B — d x B-d p A) is an operator acting on w. Equation (O 
has the formal solution 

w(t + At) =c Atn w(t), (7) 

where we can think of the operator e At ^ as a symplectic map from t to t + At. This operator can be split, in an 
approximate sense, into a succession of discrete but symplectic steps, each of which can be exactly integrated. The 
most common choice is to separate out the kinetic and potential energies, H = T(p) + V^a:), such that we can split 



^AtU _ „At(T+V) 



e AtV e AtT 



.AtU 



(8) 



Because the differential operators T = {•, T} and V = {•, V} are non-commutative, the central relation in equation ([5]) 
is only approximately true. This operator splitting is extremely useful, because, while equation (O has in general no 
simple solution, the equivalent equations for each of our new operators do: 



^Atr 



X 




x + Atp 


p. 




P 



and 



x 




X 


p. 




p - AtVV{x) 



(9) 



6 A symplectic integrator which obtains zero energy error is exact. 
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These operations are also known as drift and kick operations, because they only change either the positions (drift) or 
velocities (kick). Note that the drift step in (|8j) is identical to the simple Euler method (|4a|. while its kick step is not 
identical, because the acceleration is calculated using the drifted rather than the initial positions. The integrator that 
applies a drift followed by a kick (equation [5]) is called modified Euler scheme and is symplectic. 

It is clear from the similarity between the simple and modified Euler schemes that both are only first order accurate. 
We can do better by concatenating many appropriately weighted kick and drift steps: 

N 

e Atn = -QgQ.Atv e fc,zvtr = e zit-H+e>(At™+ 1 ) ^ 

i 

with coefficients a, and bi chosen to obtain the required order of accuracy n. From equation (|10[) we see that: (i) the 
approximate Hamiltonian H is solved exactly by the successive application of the kick and drift operations, and (ii) 
H approaches H in the limits At — > or n — > oo. At second order (n = 2), and choosing coefficients that minimise 
the error, we derive the leapfrog integrator: 

e AtU+0(At 3 ) =e \AtV e AtT e ±AtV _ (-q) 

Applying equations (O, this becomes (subscripts and 1 refer to times t and t + At, respectively): 

x = x + ^a At (12a) 

xi = x a + x At (12b) 
xi = x' + \a x At (12c) 

where do = — ^V(x^) and a\ = — VV^cci), while the intermediate velocity x! serves only as an auxiliary quantity. 
Combining equations (|12p we find the familiar Taylor expansions of the positions and velocities to second ordefl 

x\ = x + x At + ^a At 2 , (13a) 
xi = x + |(ao + a±)At. (13b) 

In principle, one may combine as many kick and drift operations as desired to raise the order of the scheme. However, 
it is impossible to go beyond second order without having at least one dj and one bi coefficient in equation (|10j) 
be negative [48l l49l . see also l50l ] . This involves some backwards integration and is problematic when using variable 
timesteps — especially if time symmetry is requirecQ. 

As mentioned previously, variable individual timesteps are a requirement for most TV-body applications. Unfortu- 
nately, once we allow At to vary in space and time, the symplectic nature of the leapfrog is broken. But, we can do 
almost as well as symplectic by ensuring that the scheme remains time symmetric [52j [. One route to time symmetry 
is to solve the implicit leapfrog equations (|13[) using a symmetrised timestep: 



At=±[T(x ,x ,ao...)+T(xx,x 1 ,a 1 ...)], (14) 

where the function T generates the step size depending on the position, velocity, acceleration etc. (we will discuss 
possible functional forms in S I2.3.3I) . Since At is a function of the force evaluated at t + At, which in turn is a function 
of At, such a scheme is implicit and in general requires an iterative solution (53|. Since each iteration involves another 
expensive force evaluation, implicit schemes are not used in practice. Fortunately, there are explicit and time-symmetric 
methods for adapting the time step [54], for example 

At old At ncw = T(x,x,a...) 2 , (15) 

where At a \d and At new are the time steps used to evolve to and from the arguments of T. Clearly, equation (|15p is 
time symmetric by construction and requires no iteration to solve. 



7 This is the kick-drift-kick (KDK) leapfrog. An alternative is the drift-kick-drift (DKD) version. In practice, the KDK is 
preferable, because acceleration and potential are known at the second-order accurate positions (not at an auxiliary intermediate 
position as for the DKD), facilitating their usage in time-step control. Moreover, with the block-step scheme (see £12.3.31 and 
Fig. [3| the KDK results in synchronised force computations for all active particles. 

8 Recently, Chin & Chen [Hl[ have constructed fourth-order symplectic integrators which require only forwards integration. 
To achieve this, rather than eliminate all the errors by appropriate choice of the coefficients en and bi, they integrate one of the 
error terms thus avoiding any backward step. Their method requires just two force and one force gradient evaluation per time 
step. It has not yet found application in A^-body dynamics, but could be a very promising avenue for future research. 
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Fig. 2. Left Comparison of the leapfrog integrator (black); a 4th order Hermite scheme (red); and time symmetric leapfrog 
using variable timesteps (blue) for the integration of an elliptic (e = 0.9) Kepler orbit over 100 periods. In the first two cases a 
fixed timestep of 0.001 of the period was used; in the latter case we use the timestep criteria in equation Q20p . Middle Fractional 
change in energy for the Kepler problem for the leapfrog integrator with fixed timesteps (black), variable timesteps (equation 
1201 red), and symmetric variable timesteps (blue). Right Fractional change in energy for the Kepler problem for the 4th order 
Hermite integrator with fixed timesteps (black) variable timesteps (equation [23 red), and variable timesteps (equation 1261 red 
dotted). The blue curve shows the energy error for the same Kepler orbit calculated using the K-S regularised equations of 
motion (see text for details). All calculations with variable timesteps were run at the same computational cost (~ 250 force and 
jerk evaluations per orbit, which is about a quarter of the cost of the fixed-timestep calculations). 



In the middle panel of Fig. [2j we compare energy conservation for the leapfrog using fixed timesteps, variable 
timesteps, and time symmetric variable timesteps of equation (|15l) for a Kepler orbit with eccentricity e = 0.9. With 
fixed timesteps (black) the energy fluctuates on an orbital time scale, but is perfectly conserved in the long term; 
with variable timesteps, manifest energy conservation is lost (red); while with the time symmetric variable time step 
scheme, we recover excellent energy conservation (blue). The time symmetric variable time step leapfrog used about a 
quarter of the force calculations required for the fixed-step integration while giving over an order of magnitude better 
energy conservation. This is why variable timesteps are an essential ingredient in modern A^-body calculations. 



2.3.2 Hermite integrators 

The leapfrog integrator is a popular with collisionless TV-body applications because of its simplicity, manifest energy 
conservation, and stability. However, its integration errors on short time scales make it less useful for studying collisional 
systems, where one must correctly track chaotic close encounters, and such errors would rapidly ruin the integration. 
Shrinking the timestep helps, but as the leapfrog is only a second-order scheme, the step sizes required become 
prohibitively small. This motivates considering higher-order non-symplectic integrators for collisional applications. 

The current state of the art are fourth-order non-symplectic integrators, so called Hermite schemes (for higher 
orders see 46] ) . From the Taylor expansion of the position and its time derivatives at time t + At 



xi = x + x At + \a^At 2 + ±a At 3 + ^a Zii 4 , (16a) 

xi = x + a At + ±d Z\i 2 + \a At 3 + ±a a At 4 , (16b) 

ai = a + a At + \a At 2 + \a Q At z ', (16c) 

di = d + a At + \ auAt 2 , (16d) 

we can eliminate do and d'o to obtain 

xi = x Q + + x Q )At + ±(a - a x )At 2 + 0(At 5 ), (17a) 

±i = x Q + i(ai + oq) At +±{a - a^At 2 + 0(At 5 ). (17b) 



These equations are not only fourth-order accurate but also time symmetric (though not symplectic) and will therefore 
give excellent energy conservation. The only snag is their circularity: in order to obtain aq and iq we need to know 
the acceleration a and jerk d not only at time t (where we can readily compute them from equations [5] and [3]) but also 
at time t + At, when they in turn depend on aq and iq. In order words, equations (1171) define an implicit scheme. In 
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practice, this difficulty is side-stepped by first predicting positions and velocities 

x p = x + x At + \a a At 2 + \a At s , (18a) 
x p = x + a At + ±a Z\t 2 ; (18b) 

then estimating acceleration a\ and jerk a,\ using equations © and ([3]) with the predicted positions and velocities; 
and finally obtaining the corrected X\ and X\ from equations (If 71) . A single iteration of this method is is called a 
Predict-Evaluate- Correct (PEC) scheme; further iterations are denoted P(EC) n , with n the number of iterations [55| . 
In the limit n — > oo, we converge on the implicit Hermite solution (|I7[) . In practice, implicit integration schemes are 
not employed because of the numerical cost of calculating the acceleration and jerk over several iterations. However, 
unlike the implicit Hermite scheme, the explicit PEC scheme is not time symmetric. 

A comparison of various flavours of the 4th order Hermite integrator are given in Figure [2j for the integration of an 
elliptical Kepler orbit with e = 0.9 over 100 orbits. Notice that the leapfrog integrator with fixed timestep conserves 
energy exactly (in the long-term), but that the peak of the oscillations is initially ~ two orders of magnitude worse 
than for the 4th order Hermite scheme with fixed steps (compare black lines in middle and right panels). Over time, 
the energy losses accumulate for the Hermite integrator, causing the apocentre of the orbit to decay, but the orbital 
precession (both are numerical errors) is significantly less than for the leapfrog (compare black and red orbits in the 
left panel). It is the orbital stability and excellent energy accuracy that have made Hermite integrators popular for 
use in collisional A-body problems. 



2.3.3 The choice of time-step 

Given the enormous dynamic range in time involved in collisional A-body problems (ranging from days to giga- years) , 
it has become essential to use variable timestep schemes (25j . Early schemes used an individual time step for each 
particle. However, it is better to arrange the particles in a hierarchy of timesteps organised in powers of two, with 
reference to a 'base step' Ato [56j : 

At n = At /2 n (19) 

for a given timestep rung n. Particles can then move between rungs at 
synchronisation points as shown in Fig.[3j This block-step scheme leads 
to significant efficiency savings because particles on the same rung arc 
evolved simultaneously. However, time symmetry with block stepping 
presents some challenges [57l |. A key problem is that, in principle, 
particles can move to lower timestep rungs whenever they like, but 
they may only move to higher rungs at synchronisation points where 
the end of the smaller step overlaps with the end step of a higher 
rung (see Fig. [3]). This leads to an asymmetry in the timesteps, even 
if some discrete form of equation (fT5|) is used. Makino et al. [57j show 
that it is possible to construct a near-time symmetric block time step 
scheme, provided some iteration is allowed in determining the time 
step. Whether a non-iterative scheme is possible remains to be seen. 

We now need some criteria to decide which rung a particle should 
be placed on. For low-order integrators like the leapfrog, we have only 
the acceleration to play with. In this possible timestep criterion 

can be found by analogy with the Kepler problem: 

Ati=<ny/\$r\/\ai\, (20) 

where <Pi is the gravitational potential of particle i, and r] is a 
dimensionless accuracy parameter. Substituting <Pi = GM/fi and 
| Oi| = GM/rf, valid for a particle at radius orbiting a point mass 
M, we see that this gives AU = r]^/rf /GM, i.e. exactly proportional 
to the dynamical time. However, a timestep criteria that depends on 
the potential is worrisome since the transformation <P — > <P + const, has no dynamical effect, but would alter the 
timesteps. In applications like cosmological A-body simulations, where the local potential has significant external 
contributions, simulators have typically employed: 



no 



ni 



n2 




Ato/2 



Ato/4 



Fig. 3. Schematic illustration of a block 
time-stepping scheme. Particles are organised on 
timesteps in a hierarchy of powers of two rela- 
tive to a base time Ato- The time step level, de- 
noted no, 1,2... is called the timestep rung. Particles 
can move up and down rungs at synchronisation 
points marked by the red arrows. 



At. t = r]y/e/\ai\ 



(21) 
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and similar, where e is the force softening length (see $3]). Equation (|21l) is really only defined on dimensional grounds: 
it creates a quantity with dimensions of time from a local length scale — the force softening — and the local acceleration. 
It is clear that this time step criteria would be of no use for, say, the Kepler problem, where it will lead to too small 
steps at large radii, and too large steps at small radii. 

In a recent paper, Zemp et al. [58[ have attempted to solve the above conundrum by trying to determine what a 
particle is orbiting about. If this is known, then the dynamical time itself makes for a natural timcstcp criteria: 



nfo/GMin), (22) 



where M(r^) is the mass enclosed within the particle's orbit from some 'attractor' at distance rv. lZemp et al.l attempted 
to define a M(r) based on information taken from a gravitational tree structure (see Sj3]). Such ideas lend themselves 
naturally to collisionless simulations, where a tree is often readily available as a by-product of the force calculation (see 
£13.5.11) . But it remains to be seen if such a timestep criteria can be competitive for collisional A^-body applications. 
Unlike many collisionless applications, in collisional A^-body applications, it is often well-defined from the outset what 
particles are orbiting about — at least until close interactions occur when case special treatment is required anyway. 
In addition, the higher-order integrators typically employed provide a wealth of additional 'free' information that can 
be used to determine the timestep. The fourth order Hermite integrator, for example, gives us a, a and a (the latter 
two from finite differences). Such considerations have motivated higher-order time-stepping criteria. 

For example, an immediately obvious choice for a higher-order criterion is to set the time step based on the 
truncation error in the Hermite expansion: 

AU= (ryKI/la,!) 172 or At { = | / 1 a, |) 1/3 (23) 
or to use the error in the predictor step, as suggested by Nitadori & Makino [46]: 

AU = At oUti (r)\ai\/\ai - a Ptl \) 1,p (24) 

where p is the order of the expansion, a p ^ the predicted acceleration, and Z\£ id,i the previous timestep. However, 
while such criteria seem sensible, they are all out-performed by the seemingly mystic Aarseth [25^ criterion 

Mi = L±p±ipL) 1 ' 2 (25) 

V \ai\\ai\ + la^J 

with r\ ~ 0.02 (which can be generalised for higher-order schemes, see [4(1 1591]). 

The success of equation (f2"5| probably lies in the fact that it conservatively shrinks the time step if either a or a 
are large compared to the smaller derivatives, and it requires no knowledge of the previous timestep. Equations (|23[) 
give poorer performance because they do not use information about all known derivatives of a. Equation (|24|) gives 
poorer performance for large timesteps. It too uses information about all calculated derivatives of a (since it is based 
on the error between predicted and true accelerations). But the problem is that it relies on the previous timestep for 
its normalisation. If At a \d,i is too large, the criteria will not respond fast enough, leading to overly large timesteps and 
large energy losses |46| . 

The above suggests that a conservative 'truncation' error-like criteria that encompasses all derivatives of a might 
perform at least as well as the Aarseth criteria while being (perhaps) more theoretically satisfying: 





4 ti = nm > hl^T >->K^r (26) 



where p is the highest order of a calculated by the integrator; and oW is the p th derivative of a. 

In the right panel of Fig. [3J we compare 4th order Hermite integrators with different variable timestep criteria for 
a Kepler orbit problem with e = 0.9. The black curve shows results for a fixed timestep with At = 0.001 periods; the 
red curve shows results using a variable timestep and the Aarseth criteria (equation I25[) ; and the red dotted curve 
shows results using equation (|2"o| . For the Aarseth criteria we use r\ = 0.02; for our new criteria in equation (|2"o| we 
set r\ in all cases such that exactly the same number of steps are taken over ten orbits as for the Aarseth criteria. 
The 'truncation' error-like criteria fequation l26l) appears to give very slightly improved performance for the same cost. 
However, whether this remains true for full A^-body applications remains to be tested. 

The above timestep criteria have been well tested for a wide range of problems and so appear to work well — at 
least for the types of problem for which they were proposed. However, there remains something unsatisfying about all 
of them. For some, changing the velocity or potential can alter the timestep and, with the possible exception of the 
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criterion by Zemp et al. [581 ] . all are affected by adding a constant to the acceleration. This is unsatisfactory, since 
the internal dynamics of the system is not altered by any of these changes. Applying a constant uniform acceleration, 
generated for example by an external agent, to a star cluster is allowed by the Poisson equation and does not alter the 
internal dynamics, and thus should not drastically alter the timesteps. Only if the externally generated acceleration 
varies across the cluster does it affect its internal dynamics, an effect known as tides. This suggests 

At^iv/WiVaW 2 (27) 

where Va is the gradient of the acceleration and 1 1 • 1 1 denotes the matrix norm. Remarkably, for the Kepler problem this 
agrees with equation (l20l) , while for isolated systems with power- law mass profiles it is very similar to equation (1221) . 
However, computing the gradient of a merely for the sake of the timestep seems extravagant. 



2.3.4 Close encounters and regularisation 

A key problem when modelling collisional dynamics is dealing with the divergence in the force for Xi — > Xj in 
equation requiring prohibitively small timesteps (or large errors) with any of the above schemes. Consider our 
simple Kepler orbit problem. For a timestep criteria as in equation (|20j) . this gives a timestep at pericentre r p of 
At 2 <x rt/GM, Thus, for increasingly eccentric orbits, the timesteps will rapidly shrink, leading to a few highly eccentric 
particles dominating the whole calculation. To avoid this problem, collisional .ZV-body codes introduce regularisation 
for particles that move on tightly bound orbits. The key idea is to use a coordinate transformation to remove the force 
singularity, solve the transformed equations, and then transform back to physical coordinates. Consider the equations 
of motion for a perturbed two-body system with separation vector R = X\ — X2 (using R = \R\): 



R 

R = -G(mi + m 2 )— + F 12 , 
R A 

where Fi 2 = Fi — F2 is the external perturbation. This, of course, still has the singularity at R 
the time transformation dt — Rdr: 



(28) 



0. Now, consider 



R" = ^-R'R 1 
R 



G(mi 



R 

m 2 )- 



R F\ 



(29) 



where ' denotes differentiation w.r.t. r. Note that we have removed the R~ 2 singularity in the force, but gained another 
in the term involving R' . To eliminate that, we must also transform the coordinates. The current transformation of 
choice is the Kustaanheimo-Stiefel (K-S) transformation [25|, [6(| [6l|] , which requires a move to four spatial dimensions. 
We introduce a dummy extra dimension in R = (R\, R2, i?3, R4), with R4 = 0, and transform this to a new four vector 
u = [u\, U2, U3, U4) such that R — C{u)u, with: 



£ = 



ui -u 2 —U3 U4 
U 2 Ui -14 4 -U3 
U 3 U 4 Ul u 2 
U 4 -U 3 1t 2 — Ul 



(30) 



The inverse transformation is non-unique, since one of the components of u is arbitrary. In general, we may write: 



u\ 



u\ 



i(i?i + R) cos 2 '0; u 2 



-{Ri +i?)sin 2 V>; u 3 



R2U1 + R3U4 
i?i + R 

R3U1 - R2U4 



(31a) 



(31b) 



Ri+R 

where ijj is a free parameter. It is a straightforward exercise to verify that equations (|3"Tj) satisfy the transformation 
equation R = C(u)u. We also require a transformation between the velocities R and it'. Writing R' — C(u')u + 
C(u)u' = 2£(u)u', and using the relation C T £ = RI gives: 



1 /-T -ft' _ 1 r T p 

2 C T ~ 2 C R 



(32) 



where the last relation follows from the time transformation dt — Rdr. Substituting the K-S coordinate transform 
into equation (|2"9"|) gives (25j : 

u" - -Eu = -RC T Fi2 (33a) 
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where E is the specific binding energy of the binary, which is evolved as: 



E' = 2u' ■ C T F 



12- 



(33b) 



(Note that the transformed time is given by t' — \u\ 2 , which follows from equation 121)1) We can now see two important 
things. Firstly, there are no longer any coordinate singularities in equations (|3"3"|) . Secondly, in the absence of an external 
field (F12 = 0), E — const, and our transformed equations correspond to a simple harmonic oscillator. 

We can evolve the above regularised equations of motion using the Hermite scheme f £12.3.2p . so long as we can 
calculate u'" and E". These follow straightforwardly from the transformed time derivatives of equations ([33)) : 



where Q — C T F\2 describes the external interaction term. 

In Figure^c), we show results for a Kepler orbit with eccentricity e = 0.9 integrated over 100 orbits using the K-S 
regularisation technique (blue). We use a Hermite integrator with variable timesteps, and timestep criterion (1261) . For 
as many force calculations as the variable timestep Hermite integration scheme, the results are over 100 times more 
accurate. This is why K-S regularisation has become a key element in modern collisional iV-body codes. 

K-S regularisation as presented above works only for a perturbed binary interaction. However, it is readily gener- 
alised to hi ghe r order interactions where for each additional star, we must transform away another potential coordinate 
singularity [25l . [62l l63l | . In practice, this means introducing N coupled K-S transformations, which requires 4:N(N— 1) + 1 
equations, making extension to large N inefficient. For this reason, chain regularisation has become the state-of-the 
art [64|,|65j]. The idea is to regularise only the close interactions between the N particles, rather than all inter-particle 
distances, which reduces the number of equations to just 8(AT — 1) + 1, paving the route to high N. For interactions 
involving large mass ratio, other regularisation techniques can become competitive with the K-S chain regularisation 
jrJrj . This is particularly important for interactions between stars and supermassive black holes. 

2.4 Recent numerical developments 

Many of the key algorithmic developments for collisional A^-body simulations were advanced very early on in the 
1960's and 1970's [aHH Uli 63]. As a result, the field has been largely driven by the extraordinary improvement in 
hardware. From the early 1990's onwards, the slowest part of the calculation - the direct Af-body summation that 
scales as N 2 - was moved to special hardware chips called GRAPE processors (GRAvity PipE; Ito et al. [67j )■ The latest 
GRAPE-6 processor manages an impressive ~ ITerafiop (68|, allowing realistic simulations of star clusters with up to 
10 5 particles [l6|. However, to move toward the million star mark (relevant for massive star clusters), several GRAPE 
processors must be combined in parallel. This became possible only very recently with the advent of the GRAPE-6A 
chip [6{|. The GRAPE-6A is lower performance (and cheaper) than the GRAPE-6, but specially designed to be used in a 
parallel cluster. Such a cluster was recently used by Harfst et al. @ to model a star cluster with A^ = 4 x 10 6 . 

The GRAPE processors have been invaluable to the direct A^-body community, and with the recently developed 
GRAPE-DR, they will continue to drive the field for some time to come [7(|. However, concurrent with the further devel- 
opment of the GRAPE chips, significant interest is now shifting towards Graphical Processor Units (GPUs) for hardware 
acceleration. This is driven primarily by cost. Even the smaller and cheaper GRAPE-6A costs several thousand dollars 
at the time of writing and delivers ~ 150 GigaFlops of processing power. By contrast GPUs deliver ~ 130GigaFlops 
for just a couple of hundred dollars. The advent of a dedicated A^-body library for GPUs makes the switch to GPUs 
even easier [7l|. Whether the future of direct A'-body calculations lies in dedicated hardware, or GPUs remains to be 
seen. A third way is entirely possible if new algorithms can make the force calculations more efficient. We discuss the 
prospects for this, next. 

2.5 Critique and numerical alternatives 

One of the main problems with contemporary A-body codes for collisional stellar systems is their great similarity. An 
immediate consequence is that the usual method for the validation of simulation results (see @ by comparing inde- 
pendent approaches is hardly possible. In addition, splitting the force computation into near and far field components 
is intimately connected with the time integration such that one cannot simply change one without the other. 

The requirement for an accurate force computation by no means implies the need for the costly brute-force approach 
currently employed. Alternatively, an approximative method with high accuracy, such as the fast-multipole method 
(see ^3.5.1|) . requires only 0{N) instead of 0{N 2 ) operations to compute the forces for all N particles. 

Time integration could perhaps also be improved. The Hermite scheme typically employed is neither symplectic 
nor time-reversible. While this does not necessarily imply that the time integration method causes uncontrolled errors, 



u 



- {E'u + Eu' + R'Q + RQ') , E" = 2u" ■ Q + 2u'Q' 



(34) 
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it would certainly be desirable to compare to an alternative method. One interesting option is to use fourth-order 
forward symplectic integrators ([5l|, see also footnote [8]) in conjunction with a time-symmetric method for adapting 
the time steps, resulting in an overall time-reversible scheme. 



2.6 Alternatives to N-body simulations 

Collisional stellar systems are typically in dynamical, or virial, equilibrium where their overall properties, such as the 
mean density and velocity distributions of stars, remain unchanged over dynamical time scales. However, owing to 
stellar encounters, the systems evolves on much longer time scales. TV-body methods follow the dynamics on all time 
scales and thus do not exploit the fact that the system is almost in dynamical equilibrium. 

An alternative is to use a mean-field approach where a dynamical equilibrium is evolved to another dynamical 
equilibrium, using some prescription for the processes, such as stellar encounters, driving this evolution. This is the 
gist of Fokker-Planck codes, which approximately solve the collisional Boltzmann equation 

dt - at +x dx dx dx~ 1[n - {6b > 

Here f{x,x,t) is the density of stars in six-dimensional phase-space {x, i}, also known as the distribution function 
(see £l3.1l for more on this). The encounter operator r[f] describes the interaction between stars. In the limit r[f] — > 0, 
equation (|35|) recovers the collisionless Boltzmann equation (j39|) . In general r[f] is a complicated non-trivial functional 
of f(x,x,t). When replacing r[f] with an approximation obtained using certain simplifying assumptions, we obtain 
the Fokker-Planck equation (FPE, not given here, see [l| for more details). The combined solution of the FPE and 
the Poisson equation (1381) is still a formidable problem: solving it in six phase-space coordinates plus time is simply 
unfeasible. Several methods have been employed to tackle this problem, very briefly summarised below. 

Orbit averaging averages the net effect of the FPE over each orbit (assuming they are regular), thus reducing from 
six to three dimensions. The resultant equations may then be solved on a mesh. Following the pioneering work of Cohn 
typically this has been done in two dimensions, integrating the specific energy and z-component of the angular 
momentum (e.g. (73[, we are unaware of any fully 3D calculation). 

Monte-Carlo methods, pioneered by Henon [74j], sample the stellar system using tracer particles, whose trajectories 
are followed including not only the background potential but also the diffusion in velocity according to the FPE. 
Current implementations assume spherical symmetry to boost the resolution (for a recent example see (75|). 

Fluid Models take velocity moments of the FPE, resulting in fluid- like equations [76| . This avoids the need to orbit- 
average the FPE, but (1) implicitly assumes that scattering events are local and (2) requires some assumption on the 
distribution function to close the hierarchy of moment equations (i.e. an effective equation of state). 

Despite the large number of assumptions that go into workable Fokker-Planck codes (independent local 2- body 
encounters only, assumed Coulomb logarithm, etc.), the agreement with full iV-body models is remarkable [77h79| . 
Currently, with such a large number of assumptions Fokker-Planck codes mainly increase our understanding of the 
iV-body simulations, rather than act as an independent cross-check of the results. However, given the rapid advance 
in computational power, it is perhaps time to revisit this approach. One may also consider alternatives to the FPE 
(but still based on equation [35]) , for example similar to the Balescu [8(|-Lenard [8l[ equation for plasma physics (82j. 
There remain significant numerical challenges to such alternative approaches, but they hold the promise of a robust 
method for solving collisional iV-body dynamics that relies on very different assumptions to the iV-body method. 



2.7 Past, recent, and future astrophysical modelling 

While the focus of this review rests firmly on numerical methods, we briefly discuss our personal highlights of previous 
astrophysical results, as well as recent and possible future developments. 

Perhaps the earliest result was the numerical demonstration of core- collapse [5j. which inspired the theory of 
gravothermal- catastrophe [3(j. Since then the increase in N to nearly 10 5 [Til . [l5l [ipl l83l| has confirmed the onset of 
gravo-thermal oscillations after core-collapse, first discovered using a fluid code |84| . Exploring the effect of binaries, 
several studies found that binaries may delay or even reverse core collapse (37l . l85l. |86|| . 

Apart from being fascinating from a theorist's point of view, core collapse may explain the rich variety of Globular 
Clusters observed in our Galaxy (§3, as well as potentially raising the central density enough to promote stellar 
collisions. This latter process can seed the onset of runaway growth, leading to the formation of intermediate mass 
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black holes (IMBHsjfl [sH [S3 ■ The firm detection IMBHs in star clusters in the basis of dynamical evidence is quite 
challenging [95j, [96| and requires understanding of the collisional cluster dynamics (97l . [98| . 

The most important force acting in collisional A^-body systems is gravity, and solving the gravitational force 
equation has been the focus of this review so far. However, many other interesting physical processes are at play 
within the stars. Stars evolve over time, moving off the main sequence onto the giant branch and then eventually 
ending their lives as stellar remnants [99(. Binary sta rs have e ven more complex lives, and all sorts of interesting 
physics can occur due to mass transfer bet ween them [lOOl Il0l| . In the very dense centres of star clusters, physical 
stellar collisions can drive stellar evolution [102M104 1 . And finally, the dearth of observed gas in star clusters may 
prov i de in teresting constraints on stellar evolution models, in which case gas must also be included in cluster models 

[ioaiioej . 

The first attempt to mesh astrophysical processes with A^-body models was presented in 1987 by Terlevich jl07l ]. 
Since th en, the nu mber of stars modelled and the complexity of the stellar and binary evolution models has continued 
to grow [1081 - 41 1 lj - Recently, there has been a dedicated drive to combine different physical simulation codes together 
within one framework. Thi s is the goal of the MODEST collaboration and its off-shoots03 [Modelling and Observing 
DEnse STellar systems; H3, lllMl5j . 

With ever-improving hardware and software, the challenges in modelling TV-body systems will shift from solving 
gravity, to building ever more realistic models for the physics beyond the particle resolution (stellar evolution, binary 
evolution, collisions, gas physics, and feedback processes from stars and stellar remnants). Understanding these pro- 
cesses, and building believable models, which can explain objects such as the 11 minute X-ray binary 4U 1820-30 (see 
footnote [SJ as well as their distribution and evolution, will become a new frontier of computational astrophysics in the 
coming decades. 



3 N-body methods for collisionless systems 

In collisionless stellar systems the long-term effects of two-body encounters are negligible. In other words, the gravi- 
tational potential governing the stellar motions, which is actually the sum of many individual point-mass potentials, 
is well approximated by a smooth mean potential ^(x, t). If unperturbed, a collisionless stellar system quickly (within 
a few dynamical times) settles into dynamic equilibrium, when changes in the mean density and potential become 
negligible and no further evolution occurs. In this situation, the virial theorem 

2T + W = 0, with T = Y li \rn i xl and W = ^ x, ■ Xi (36) 

holds and one speaks of virial equilibrium. Because relaxation in the thermodynamic sense does not occur, such 
virial equilibria have generically non-Maxwellian and anisotropic velocity distributions, corresponding to a tensor-like 
pressure in the fluid picture. 

The main exponent of collisionless stellar systems are galaxies and systems of galaxies (clusters and the universe 
as a whole), which have number densities too small and dynamical times scales too long for stellar encounters to 
be important. However, whereas for collisional systems external perturbations are usually weak, such perturbations, 
including galaxy encounters and mergers, are frequent and often significant for collisionless system. As a consequence, 
collisionless systems are frequently perturbed away from equilibrium, resulting in evolution to a new equilibrium. 
Another related process is secular evolution, when the perturbation originates from an instability of the system itself. 



3.1 Equations governing collisionless stellar systems 



Because the graininess of the stellar dynamics has negligible effect, collisionless stellar systems are commonly described 
using continuum methods. The fundamental quantity describing the state of the system at any time is its distribution 
function f(x, x, t), which is the mass density of stars (in the continuum limit) in six-dimensional phase-space {x, x} at 
time t. This is a significant simplification compared to collisional systems (whose phase-space is 6A~ dimensional) and 
as a consequence any correlations between particles, such as binaries and encounters, are ignored 1 " 1 ! The continuous 

9 There are a number of stellar-mass black-hole candidates 18811 . as well as super- massive black- hole candidates (~ 10 6-9 Mq , 
found at the centres of galaxies like our own Milky Way, e.g. [83, However, there is no confirmed discovery of an IMBH 
with ~ 10 3-4 Mq, which are difficult to detect unambiguously, but would provide a missing link [oil. |92T|. 

10 See also http://www.manybody.org/modest/ 

11 In other words, the BBGKY hierarchy (after their discoverers Bogoliubov, Born, Green, Kirkwood, and Yvon, see also 0|) 
of the 1-body, 2-body, and higher order distribution functions is truncated at the lowest order. 
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Fig. 4. The effect of mixing demonstrated with simple phase-mixing of 10 4 points in the Hamiltonian H — |p 2 + \q\. This 
corresponds to massless tracer particles orbiting a central point mass in one-dimensional gravity (where phase-space is two- 
dimensional). The fine-grained distribution function is either 1 or 0, but at late times a smooth coarse grained distribution 
appears. Note that dynamical mixing in 3D self-gravitating systems is much faster and stronger than with this toy model. 



spatial density and gravitational potential generated by the system are obtained as 

p(x,t) — J dxf(x,x,t), (37) 

$(x,t) = -G [ dx'^% = -G [[ (38) 
J \x-x'\ JJ \x-x'\ 

respectively. Because, according to Liouville's theorem, phase-space volume remains constant along the flow and 
because of mass conservation, the ratio / = dA//d{a;, x} is constant too, thus satisfying 

Q = d -L = d -L + x . d -L- d -^. d -L, (39) 

dt dt dx dx dx ' 

known as the collisionless Boltzmann equation (CBE). Here, we have used the equation of motion x = — V^tot with 
the total potential <P to t = <P + <P xt , which includes contributions from external agents not modelled by the distribution 
function /. The evolution of a collisionless system is thus governed by the CBE in conjunction with the Poisson 
equation (|38[) and, possibly, an external potential. 

For equilibrium systems df /dt — and thus d<P/dt — 0. In this case, any distribution function which depends 
on the phase-space co-ordinates only through isolating integrals of motiorP^ solves the CBE. This is known as the 
Jeans theorem and allows the construction of simple equilibrium models. Unfortunately, the most general collisionless 
equilibria are triaxial and even though most of their orbits are regular and respect three isolating integrals, only the 
orbital energy allows simple treatment. Therefore, the Jeans theorem is of practical usage only for systems of higher 
symmetry, where the angular momentum (or one of its components) is also an isolating integral. 

For near-equilibrium systems, approximate solutions can be obtained by writing / = /o + /i (with fx <C /o and 
fo describing a given equilibrium model), and ignoring the term of the CBE quadratic in f\. The resulting linear 
perturbation analysis gives insight into the stability properties of simple equilibrium models. However, most galaxies 
frequently undergo strong perturbations and considerable deviations from equilibrium, when these methods fail and 
instead a full numerical treatment is required. 

The direct numerical solution of the CBE, a non-linear PDE in seven dimensions, is not feasible. This is not just 
because of the vastness of six-dimensional phase-space and the inhomogeneity of / (typically most of the mass resides 



12 Any function I(x,x), such that 7 = and the constraint /(as, x) = Io reduces the dimensionality of phase-space by 
one (isolates) is an isolating integral of motion. Examples are the orbital energy for static potentials, or the orbital angular 
momentum in case of a spherical potential. 
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in a tiny fraction of the bound phase space). A much more severe problem is that under the CBE / develops ever 
stronger gradients, even when evolving towards equilibrium. This is because / is conserved along the flow, so that 
initial fluctuations are not averaged away, but mixed leading to ever thinner layers of different phase-space density. 
These require ever higher numerical resolution even though the system hardly evolves observably, as demonstrated 
in Fig. 2J This figure also demonstrates that the mixing invalidates the continuum limit: at late times the concept 
of the distribution function becomes useless. An alternative is to average over the fluctuations by considering the 
'coarse-grained' phase-space density /, a local mean of /. However, for / no evolution equation exist J*^l. 



3.2 Numerics of collisionless N-body simulations 

Fortunately enough, the above problems are easily overcome by the usage of A-body techniques. The basic idea is to 
model the distribution function by an ensemble of A phase-space points {xi, Xi}, i = 1 . . . N with weights fa, which are 
randomly chosen to represent f(x, x,t = 0). The conservation of / along the flow implied by the CBE then means that 
the weights fa remain unchanged along each trajectory, such that the task is reduced to integrating all A trajectories. 
Thus, in one sense an A-body code solves the CBE by the method of characteristics for solving PDEs — the particle 
trajectories are the characteristics of the CBE. In another sense, an A-body code is a Monte-Carlo technique: any 
initial sample of A phase-space points drawn from the same distribution function at t = results in another, equally 
valid, A-body model for the time evolution of f(x, x, t). An important consequence is that the A simulated particles 
are not modelling individual stars, nor is it helpful to consider them as 'super stars' of enormous mass. The only 
correct interpretation is that the ensemble of all A particles together represents the continuous distribution function 
/ (and in fact provides an implied coarse-graining dependant on the numerical resolution). Thus unlike the situation 
for collisional A-body methods, the number A of particles is a numerical parameter, controlling the resolution and by 
implication the accuracy for certain predictions of the model. 

With the A-body method one has no knowledge of the distribution function / at t > 0, except at the phase-space 
positions of the A particles (where / remains at its original value by virtue of the CBE) . However, this is not a problem 
at all, not least because the structure of / is not necessarily very useful (see above), but also because any moments of 
the distribution function can be estimated from the A particles in the usual Monte-Carlo way 

(9)(t) = JJ AxAxg{x,x) f(x,x,t) « ^ fag{x l {t),x i {t)) . (40) 

i 

Here, the function g specifies the moment, for example g = 1 obtains the total mass, while g — ^x 2 results in the total 
kinetic energy. Such moments are in fact all one ever wants to know: all observable properties, including the coarse- 
grained distribution function, are just moments of /. However, one must not forget that this estimation procedure is 
always subject to shot noise, the amplitude of which depends on the number of particles effectively contributing, and 
hence on the numerical resolution and the width of the moment function g. 



3.3 Cosmological N-body simulations 

For cosmological simulations, in principle we should switch to general relativistic (GR) equations of motion. However, 
we can take advantage of a very useful approximation. On large scales, the Univers e is very nearly isotropic and 
described well by a smooth Friedmann-Lamaitre- Robertson- Walker (FLRW) spacetime [116j . However, on small scales 
the Universe must tend towards a locally inertial frame in which Newton's laws are valid. This fact can be used to 
show that the cosmological expansion has essentially no effect on the local dynamics, even on galaxy cluster scales 



117j . Thus, we may think of the Universe as being filled with a se lf-g ravitating fluid that locally obeys Newton's law 
of gravitation, while expanding as an FLRW metric on large scaleO- This leads to the following Hamiltonian for any 
orbiting test-particle [l22l Il23j : 

^ = A + ?^), (41) 
2m ? a z la 



13 One option is to use the CBE also for / and combine it with some averaging or coarse-graining operation, depending on the 
level of fluctuations developing under the CBE. However, the choice of an appropriate coarse-graining operation is a fundamental 
problem with such an approach, partly because phase-space has no natural metric. One would have to use / itself to define 
the coarse-graining and it is not clear how this should be done, in particular for cold models, when / is non-zero only on a 
hypersurface. This is the common situation with initial conditions for cosmological simulations. 

14 Recently, doubts have been raised about the validity of this approximation. Various authors suggested that local inhomo- 
geneities affe ct the lar ge-scale dynamics, for examp le m imicking cosmic acceleration which traditionally has been attributed to 
dark energy [llsL 1 1 1Q| ] . This seems to be unlikely [l20f ] , though the necessary correc tion s due to inhomogeneities could affect 
attempts to use iV-body simulations to precisely determine cosmological parameters |l2l| . 
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where X{ and pi = a 2 iriiXi are now co-moving canonical coordinates (with respect to the isotropic, smooth, expanding 
background spacetime); a = a(t) is the scalefactor that follows from the FLRW model; and the equations of motion 
follow in the usual way from Hamilton's equations. Since we are now working in co-moving space, the potential is 
really the peculiar potential with respect to the smooth background; while the forces are peculiar forces with respect 
to the expansion. Note that the above Hamiltonian now explicitly depends on time and thus doe s no t conserve energy. 
This is the standard problem of ambiguous energy conservation in GR [for a discussion see e.g. Il22j |. 

The usual method for approximating isotropy and homogeneity of the universe on large scales is to simulate a small 
cubic patch of size L of the universe (to satisfy the 'local- Newtonian' approximation), and apply periodic boundary 
conditions in co-moving co-ordinates 

*(x,t) = -G W d*' M+^A = -GV //did,' / , (3;, + nL ^ ) , (42) 
y ' ^ J \x-x'-nL\ „ JJ \x-x'-nL\ 

where the s um o ver n = (n x ,n y ,n z ) accounts for all periodic replica. In practice, the periodic sum is approximated 
using Ewa ld |l24| 's method, which was originally invented for sol id-st ate physics and imported to this field by Hernquist 
et al. [125| (but note an error in their eq. 2.14b as pointed out by [126| ]). Alternatively, Fourier methods, which naturally 
provide periodic boundary conditions, can be used, see H3.5.2I 

3.4 Force softening 

In order to obtain the particle trajectories, one just has to integrate the equations of motion Xi = — V^tot(^i) for all N 
particles. This is usually done using the leapfrog integrator possibly with individual timesteps arranged in the block-step 
scheme ( ^2.3.ip . The self-potential $ must be estimated from the positions and masses of the particles themselves. By 
virtue of equations ((38|) or (|42|) . <P is a moment of / and so we can estimate it. However, the straightforward application 
of equation (|4"0)) results simply in the equations of motion for a collisional system with N stars of masses fa (and 
possibly immersed in an external gravitational field). This is not what we want to model and, moreover, integrating 
these equations numerically is rather difficult, as we have discussed in ^2. 31 above. 

The problem is that the function g = —G/\x — x'\ for estimating the potential via equation (|4T)]) is quite localised 
and even diverges for x — > x' , such that shot noise becomes a serious issue with the simple Monte-Carlo integration 
of equation (|4T)|) . This is closely related to estimating the spatial density p. The application of equation (|4T)|) with 
g = S(x — x 1 ) obtains a sum of ^-functions at the instantaneous particle positions, consistent with the accelerations (|2"|) 
from the Poisson equation AnGp = — V • a. The problem of estimating a smooth density from scattered data points 
(in several dimensions) is generic to many applications in science. A number of solutio ns a re known. One of them is 
to widen the <5-spikes in the density estimate to a finite size, resulting in the estimator [127| | 

i ^ ' 

where ?y(£) is the dimensionless kernel function (normalised to unit integral) and e the softening length. The corre- 
sponding estimator for the potential then follows by application of the Poisson equation, or equivalently, by replacing 
the Greens function —G/\x — x'\ with the potential generated by a mass distribution of the density kernel 

^) = -E^(^) (44) 
where Attt] = — f; -2 ^ 2 ^')' ( a pri me denoting differentiation). For example, the commonly nsed Plummer softening 

S(x)=-GJ2 7 = j (45) 



corresponds to replacing each particle by a Plummer [128] sphere of scale radius e and mass /i,, i.e. 

V (0 - (3/47T) (e + I)" 5 / 2 and <p® = (£ 2 + ly 1 ' 2 . (46) 

3.4.1 Softening as a method to suppress close encounters 

The softening of gravity has two important aspects. First, at close distances r = \xi — x\ the force no longer diverges 
as r~ 2 , but for r < e actually decays to zero. As a consequence, the force estimated from all particles is a smooth 
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and continuous function, such that integrating the equations of motion is much simpler than for the un-softened case 
(required with collisional TV-body methods). This also means that the effect and importance of close particle encounters 
is suppressed. This is exactly what we want, because close encounters are not described by the one-body distribution 
function /, but by the 2-body distribution function, the next in the BBGKY hierarchy (see also footnote ITTj) . The 
minimum softening length required to prevent large-angle deflections during close encounters is 

e 2 body ~ Gp/a 2 (47) 

with p the particle mass and a the typical velocity dispersion [129( . 

In other words, close encounters between the integrated trajectories are numerical artifacts and by softening the 
forces we eliminate their (artificial) effects and, at the same time, considerably simplify the task of time integration 
compared to collisional iV-body codes. Unfortunately, two-body relaxation is driven not only by close encounters, but all 
octaves of impact parameter contribute equally. Therefore, for ce so ftening does not, contrary to some common opinion, 
much reduce the artificial two-body relaxation (only by ~ 2, [l30j |b This implies that artificial relaxation effects may 
affect the high-density regions of TV-body simulations (where idyn and hence i rc iax is shortest, see equation [1]) . Thus 
while we aim to model collisionless systems, the simulations themselves may still suffer from small-angle deflections. 



3.4.2 Softening as optimal force estimation: the choice of the softening kernel 

The second aspect of force softening is a systematic reduction of gravity at close distances: in the limit of N — > oo but 
fixed e, the gravity estimated by (l4"4l disagrees with that of the system modelled. This bias of the average iV-body 
force is the price for the reduction in shot noise (i.e. suppression of close encounters). The presence of this force bias 
implies that simulation results on scales smaller than a few e are unreliable. 

The overall force error is a combination of this bias and the noise, measured by the variance of the force estimate, 
and depends on the system modelled, the number N of particles, the softening length e and the softening kernel rj. 
For small e, th e bias and variance for the estimated force F can be approximated analytically using a local Taylor 
expansion jl3l| 

bi a s{F(x)} = a e 2 GVp(x)+a 2 e A GVV 2 p(x) + O(e 6 ) with a k = -±—±-r-. d££*+VO, (48) 

(fc + 3)! J 

p CO 

with &=(47r) 2 / d££ 2 »j(0 ( 49 ) 
Jo 

with p(x) and M the (smooth) density of the system modelled and its total mass, respectively. Thus, the total force 
error, bias{F(a;)} 2 +var{f 1 (a;)}, becomes minimal for e op t oc N^ 1 / 5 , depending on the system and the softening kernel. 
In particular, it is required that £ 5 ?y(£) — > as £ —> oo for the constant ao to be finite. For kernels, such as Plummer 
softening (for which i] oc at £ 3> 1), which do not satisfy this condition, the expansion above does not work and 
the force bias grows faster than quadratic. 

While these static considerations may not be directly relevant for the dynamical evolution of the iV-body system, 
it appears best to reduce the force bias at least to 0(e 2 ) by using softening kernels with either finite density support 
(corresponding to exact 1/r gravity and rj = for r > e) or a decay steeper than 77 oc £~ J at £ — > 00. 

In view of equation (1481) . one may even reduce the bias further by designing softening kernels with ao = [131] . 
Such kernels have 77 < over some radial range (usually at large £), and compensate the rare under-estimation of 
gravity at small radii with a slight over-estimation at larger radii. The resulting reduction in bias allows a larger 
softening length to suppress the noise. However, equation (|48[) is applicable only when the density p(x) is smooth on 
scales smaller than e. Since many collisionless systems have quite steep central density gradients, where p may even 
formally diverge like a power law, techniques (like this) which require larger softening length appear less advisable. 



iVvar{F(a;)} = b G M e~ p(x) + 0(e) 



3.4.3 The choice of the softening length 

While e2body fequation l47l) represents a minimum softening length required to suppress artificial large-angle deflections, 
larger values simplify the time integration further and hence reduce the computational costs. Another useful criterion 
is that the maximum inter-particle force shall not exceed the typical mean-field strength. For the situatio n of a single 
stellar system of mass M and radius R, this translates to Gp/e 2 < GM/R 2 . Replacing M = Np, we find [132| 

e min ~ R/VN. (50) 

This is significantly larger than e2body ~ 2R/N (which follows from eauationl47lwith a 2 ~ GM/2R). Most practitioners 
are guided by these simple considerations, and p racti cal convergence studies regarding the best choice for e are lacking. 
A notable exception is the work of Power et al. |132| . who suggest for cosmological simulations e ~ 4e m i n - 
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3.4.4 Adaptive individual softening 

The local resolution of an TV-body system is determined by the particle number density and an obvious idea is to use 
smaller softening lengths in high-density regions. This leads to the concept of individual e, with e| = (ef + e?)/2 the 
softening used in the interaction between particles i and j (other symmetrisations are possible, but this particular one 
allows an efficient approximation when using the tree code [133| |). The individual softening lengths are adapted, for 
example such that efpi remains constant, equivalent to the adaption of individual smoothing lengths in smoothed- 
particle hydrodynamics (SPH, |134| see also Lodato & Cossins, this issue). 

When implementing such a scheme, two things must be ensured in order to guarantee the validity of the N- 
body method. First, the adaption of the softening length s mu st be time- reversible. This can be achieved to sufficient 
accuracy by a technique equivalent to that used in SPH [l35l appendix Al] . Second, since Ci depends (implicitly) on 
the positions of particle i and its neighbour s, the Af-body force d<P/dxi contains additional terms, which must be 
included, as outlined by Price & Monaghan [l36j |. to preserve the Hamiltonian character of the method and hence 
energy conservation. 

3.5 Force computation 

In collisionless A^-body methods the force is only ever an estimate, which unavoidably carries with it an estimation 
error (which is reduced by force softening, see above). Therefore, we may as well use less accurate methods for the ac- 
tual calculation of the estimated forces than the computationally expensive direct summation, i.e. the straightforward 
implementation of equation (|44p or (|4l)|) . Because the computational effort of any A^-body method is always dominated 
by the calculation of the gravitational forces, considerable effort has been invested into the design of fast force calcu- 
lation algorithms, resulting in many different methods, which we describe in some detail below. All of these methods 
are substantially faster than direct summation and together with the simplifications for the time-integration due to 
force softening, allow A?~ to be ~ 4 orders of magnitude higher in collisionless than collisional A^-body simulations. 

3.5.1 Approximating direct summation 

A number of methods are based on approximating the direct summation 

S(x b ) = - ^2 Ha <t>{x b - x a ) (51) 

a 

(corresponding to equation 03] with G and e absorbed into </>) by replacing the contributions from all particles within 
a local group by a single expression. 

The tree code was pioneered by Barnes & Hut 137] in 1986 and uses a hierarchical spatial tree to define localised 
groups of particles. Because stellar systems are often highly inhomogeneous, this is much better than defining groups 
by cells of an equidistant mesh (when few cells would contain most of the particles). With the usual oct-tree each cubic 
cell containing fewer than n max particles is split into up to eight child cells of half their parent's size. This results in a 
tree-like hierarchy of cubic nodes with the root box, containing all particles, at its bottom. The particles within each 
of the tree nodes constitute a well-defined and localised group. The approximation used in the tree code is formally 
obtained by Taylor expanding the kernel function cf> in x a around some expansion centre za of the grourj^l 

<j){x b - sc a ) « V — (x a ~ z A )" V"<j>{x b - z A ), (52) 
L — ' n! 

n|<P 

where p is the expansion order. By inserting this expansion into (|51[) we obtain for the potential from the group A the 
multi-pole expansion 

S A (x b ) = - ^2 n a 4>{x b - x a ) f=a - y~] M n (z A ) D n (x b - z A ) (53) 

a£A |n|<p 

with the derivatives D„(r) = V n 0(r) and the multipoles of group A w.r.t. its expansion centre za 

M„(Z A ) = Ha^-^-{x a - Z A ) n . (54) 

a£A 

15 Using the multi-index notation n = (n x , n y , n z ) with > 0, n = |n| = n x + n y + n z , r" = r^. x ry y r"" , and n! = nj n y \ n z \. 
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Fig. 5. Left: computation of the force for one of 100 particles (asterisks) in two dimensions (for graphical simplicity) using 
direct summation: every line corresponds to a single particle-particle force calculation. Middle: approximate calculation of the 
force for the same particle using the tree code. Cells opened are shown as black squares with their centres z indicated by solid 
squares and their sizes w by dotted circles. Every green line corresponds to a cell-particle interaction. Right: approximate 
calculation of the force for all 100 particles using the tree code, requiring 902 cell-particle and 306 particle-particle interactions 
(0 = 1 and n max = 1), instead of 4950 particle-particle interactions with direct summation. 



For the un-softened case (<f> = l/\r\), this series converges in the limit p — » oo if \xb — za\ > max a {|a; a — za\}, i.e. if xt 
is outside a sphere centred on za and containing all x a . Furthermore, if the centre of mass of the group A is chosen as 
its expansion centre za, the dipole vanishes such that the zero-order expansion (p = 0) is actually first-order accurate. 
Because of its great simplicity, this approach is in fact a common choice for practical implementations of the tree code. 

In a preparatory step, the multipole moments and sizes wa satisfying wa > max Q {|a; Q — za\} are computed 
for each tree cell. (Both is best done recursively, exploiting the results from the daughter cells via the shifting formula 

M n (z + x) = |j-M n _ k (z), (55) 

sometimes called the 'upward pass'.) The gravity at any position x from all the particles within tree cell A is then 
simply approximated by applying equation (|53D iQ Owa < r = \x — za\ with some opening angle 6 < 1. Otherwise, 
the sum of the potentials obtained by applying the same algorithm to the daughter cells is used (or, if the cell is a tree 
leaf, from direct summation over all particles within the cell). Of course, the force is computed as the derivative of 3. 

Fig. [5] demonstrates the working of the tree code and compares it graphically to the direct summation approach, 
by applying both to the task of computing the force for one of 100 particles (left and middle panels). Obviously, the 
tree code requires much fewer calculations — it is straightforward to show that the number of force computations per 
particle scales like the depth of the tree, i.e. In AT, such that the cost of computing all N forces is 0(N In N). 

The fast multipole method (FMM) a lso works with localised particle groups and for stel lar systems is best implc 



(FMM) £ 

mented using a tree structure [139L Il40j ] . although the original proposal used a fixed grid [l4l| . In addition to expanding 
the Greens function (j>{xb — x a) at the source positions x a (as for the tree code), it also expands it at the sink positions 
Xb (with zb the centre of the group B of sink particles, see also the left panel of Fig. [6]): 



cj>(x b - x a ) « E ( x * - z bT ( x » ~ z aT V n+m 4>(z B - z A ). (56) 

|n|<p |m|<p— n 

Inserting this into (|5"Tj) . yields for the potential generated by all particles in A and at any position Xb within B 

\n\<p 

F n (z B )= Y M m (z A ) D n+m (z B ~ z A ) (58) 

\m\<p—n 



16 Ensuring convergence of the series. Curiously, some early implementations used for wa simply the linear size of the cubic 
cell, when wa > maxajlcca — za\} is not g uaranteed and the approximated forces can be catastrophically wrong, resulting in 
the infamous 'exploding galaxies' bug [1381 ] . 
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Fig. 6. Left: Illustration of the geometry for the Taylor expansion used with the fast multipole method. Right: approximate 
calculation of the force for the same 100 particles as in Fig. fusing the FMM, requiring 132 cell-cell (blue), 29 cell-particle 
(green), and 182 particle-particle (red) interactions {9 — 1 and n max = 1). 



with the multipole moments M m (zA) defined in equation (|54[) . The field tensors F„(zb) are the coefficients of the 
Taylor series (|57[) for the potential around zb- This dual expansion 'at both ends' of all interactions considerably 
speeds up the simultaneous computation of gravity for all particles, but brings no advantage over the tree code when 
computing the force at a single position. The FMM algorithm works these equations backwards, starting with an 
upward pass (as for the tree code) to compute the multipole moments M m (zA) and widths wa for all cells. 

The second part is the interaction phase, when the field tensors are evaluated for each cell. This is achieved by the 
following algorithm starting with the root-root interactional. If for a mutual cell-cell interaction 8(wa + wb) < r = 
\za — Zb\, then the field tensors for the interactions A — > B and B — >• A are computed according to equation and 
added to F„(zb) and F„(za), respectively. Otherwise, the interaction is split into up to 8 new interactions by opening 
the bigger (in terms of w) of the two cells. A cell self-interaction (like the initial root-root interaction) is performed by 
simple direct summation if the cell contains only a few particles, otherwise it is split into up to 36 new interactions. 

Finally, in a down- ward pass the contributions from the parent cells are added to the field-tensors of their daughters 
after applying the shifting formula 

F n (z + x)= ^[^(4 (59) 

|k| <p— n 

followed by the evaluation of gravity via equation (|57|) at the sink positions within leaf cells. The total computational 
costs of this algorithm are dominated by the interaction phase, which only requires O(N) interactions for the compu- 
tation of all N particle forces. This represents a substantial reduction from the 0(N 2 ) for direct summation. It is also 
a factor > 10 faster than the tree code for typical N > 10 6 . 

Some practical considerations 

1. The simple convergence criterion \r\ < 9w commonly used with the tree code and the FMM is merely geometric 
and therefore controls the relative error for each interaction [140} 

|v* A _> fl | - {i-oy ■ [ > 

However, usually the forces from small (and hence nearby) cells are smaller than from bigger distant cells such 
that the absolute force error is dominated by the few interactions with big cells. A better approach is therefore to 
try to balance the force errors by a non-geometric opening criterion, for example us ing a mass-dependent opening 
angle. With this, one can obtain a total cost < O(N) at fixed approximation error [l40l |. 

2. For the un-softened Greens function <j) = l/r, the multipole expansion is reduced from three to two-dimensional 
indices when using spherical harmonics. Essentially, this exploits the fact that -D n +2x + -Dn+29 + F) n+ 2i = for any 
n (as a consequence of = V 2 0), such that the number of independent terms is reduced from ( P p 3 ) to (p + l) 2 . 

This reduces the costs for the FMM intera ction operation l|58p from 0(p e ) to 0(p 4 ), w hich may be further reduced 
to 0(p 3 ) (by fast rotation methods, e.g. [l42j ]). 0(p 2 \np) (by fast Fourier methods [143]), or even 0(p 2 ) (using 
plane wave approximations |l44j ). 

17 Here, we describe the falcON algorithm [l40f | . but essentially any multipole-based method which expands the Greens functions 
both at the source and sink positions and therefore performs cell-cell interactions qualifies as FMM. 
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Fig. 7. Logical interaction plots for the computation of all mutual forces between the same N = 100 particles as in Fig. [5] 
Left: direct summation: every red dot corresponds to the force computation for one of N(N — l)/2 particle pairs. Middle: tree 
code: the approximated force for a particle i from a cell containing many j is represented by a green line (particles are ordered in 
their tree order: all particles within a tree cell are contiguous). Forces between close neighbours often cannot be approximated, 
resulting in the clustering of red dots along the diagonal. Right: using the FMM code falcON: each blue box corresponds to 
the approximation for the force between two cells. Unlike with the tree code, green lines represent mutual interactions. 



3. Both the distribution of force errors and the total computational costs depend on the opening criterion, the expan- 
sion order, and other numerical details. For an optimal choice of these numerical parameters, the computational cost 
depends non-trivially on the required approximation error. With collisionless TV-body codes one usually requires 
only a modest relative force accuracy of ~ 10~ 3 , in which case low-order techniques are sufficient. 

4. Fig.[7]gives an alternative graphical comparison of direct summation, the tree code, and FMM for the computation 
of all N forces. With direct summation, each pair-wise interaction is considered only once because the mutual 
forces satisfy F$ = —Fji, according to Newton's third law. 

With the tree code, this natural symmetry between sinks and sources is broken: each interaction is one-sided. 
As a consequence, the full interaction matrix (except for the diagonal) has to be approximated and Newton's third 
law is violated, implying the loss of total-momentum conservation. 

With the FMM on the other hand, each interaction is mutual, exploiting the natural symmetry of the problem 
and satisfying Newton's third law for each particle pair (though the approximated Fy is not exactly aligned with the 
separation vector Xi — Xj). Only the upper half of the interaction matrix needs to be approximated. Further away 
from the diagonal, a single cell-cell interaction (blue box) approximates ever more particle-particle interactions. 

5. From Fig. [7] one can also see that for the tree code the workload can be easily split between many processors. 
This is more difficult with direct summation or the FMM. For example, when distributing the particles amongst 
processors, it is not obvious which computes the forces between particles (or cells) residing on different processors 
(but see [g). 

Critique Apart from the violation of Newton's third law with the tree code (but not the FMM), the only potential 
short-coming is a discontinuity of the forces: there are some magic boundaries in space on either side of which a 
different approximation will be used, depending on the opening criterion. As a consequence, the approximated forces 
are not conservative across these boundaries and the total energy of the iV-body system is not exactly conserved but 
exhibits some fluctuations, even in the limit of zero time step. Of course, the amplitude of this effect can be reduced 
by decreasing the opening angle 9. 

It appears that this issue has little if any effect on the validity of the iV-body model (there are no obvious 
differences between results obtained using the tree code and methods which do not suffer from this problem). In 
principle it should be possible to design a tree code or FMM without this problem by smoothly interpolating between 
the force approximations on either side of the magic boundaries. The natural adaptivity and efficiency of the tree code 
and FMM make them versatile and powerful force solvers for general-purpose iV-body codes. 



3.5.2 Grid-based methods 

Instead of solving the integral form (|38[) or (1421) of the Poisson equation, as with direct summation and its approxi- 
mations, grid-based methods solve its differential form 



V 2 <P(x) = 4nGp(x) 



(61) 
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after discretisation on a grid. 

Fast-Fourier-transform-based methods achieve this in the Fourier domain where the Poisson equation (|5T|) becomes 
k 2 <P(k) — 4:TrGp(k). The method first estimates the density on the vertices of an equidistant grid by a technique which 
spreads the mass of each particle across neighbouring cells (see [27j for details). Next, the Poisson equation is solved, 
using the fast Fourier transform (FFT), for the value of the potential on each grid vertex; and finally the potential 
and force for each particle is found by interpolation between vertices. 

This method is fast (the cost for the FFT and the interpolation are 0(rj gr id lnn gr id) and O(N), respectively), but 
not effective for inhomogeneous particle distributions typical for most iV-body simulations, when in order to resolve 
the high-density regions n^ lid S> N is required. 

The Fourier approach implicitly assumes a periodic tiling of all space with the computational domain, such that it 
actually (approximately) solves 

V 2 ${x) = AnG^2p{x + nL). (62) 

n 

This corresponds not to equation (13"51) but (j4"2"]l . exactly as desired for cosmological TV-body simulations of large-scale 
structure formation. Therefore, FFT-based methods (or hybrid methods using FFT, see below) are predominantly 
used with such simulations. 

The periodic bound ary con di tion s can be avoided either by simply doubling the computational domain in each 
dimension or better by iJamesI ' |l45j method, which subtracts the contributions from the periodic replicas of the 
computational box via a Fourier technique involving surface charges. Combining this with a set of nested grids of 
increasin g re solution enables an efficient FFT-based force solver for inhomogeneous single stellar systems, such as 
galaxies [l46j . 

Multi-grid techniques were pioneered in the west by Brandt [147| in 1977; they also interpolate the density and 
potential between grid and particles, but solve the discretised form of the Poisson equation, i.e. a large but sparse 
matrix equation, using relaxation methods, such as Gauss-Seidel iteration. The basic idea exploits the fact that 
on a coarser grid relaxation occurs faster because information travels faster. The distribution of errors (the difference 
between the actual density and that obtained via the discretisation of V 2 ^ from the current estimate for the potential) 
is first smoothed on the finest grid by a few Gauss-Seidel iterations. After transferring the problem to a coarser grid, 
the process is repeated on coarser and coarser grids until, on the coarsest grid convergence is achieved. Then the 
problem is transferred back to finer and finer grids, each time iterating until convergence. 

Since the costs of an iteration shrinks by a factor eight when going to the next coarser grid, the total cost is 
essen tially determined by the costs of a fixed number of iterations on the finest grid, and hence 0(n gr id) in theory 
[l48j |. However, as far as we are aware, in practical applications for iV-body simulations, the method is not significantly 
more efficient than others. 

The advantage of this technique over the FFT approach is that the grid does not need to be equidistant, but can be 
locally adapted according to the particle density. In fact, the structure of such an adaptively refined mesh is identical 
to that of a shallow oct-tree, as used with the tree-code and FMM. 

Critique Gravitational softening is implicit with grid methods, in contrast to direct summation and its approximations, 
and depends on the grid size. This means that for adaptive meshes the softening may vary along a particle orbit, 
depending on the local mesh resolution. Since softening unavoidably leads to a reduction in the estimated binding 
energy of a particle f £!3.4.2p . these variations introduce an unphysical fluctuation of particle binding energies. In 
non-equilibrium simulations, for example of gravitational collapse, this leads to violation of energy conservation and 
artificial secular evolution. This problem also occurs with individually adapted softening lengths with explicit softening 
(as with the tree code or FMM), but there a solution is known, see .4.41 

With adaptive multi-grid methods, it is natural to use different time steps (within the block-step scheme) for par- 
ticles living on different refinement levels. Such a scheme can be substantially accelerated by advancing the particles 
asynchronously: the particles on the coarser grid remain fixed while those on the finer grid move and their gravity is ap- 
proximated by using the coarser-grid potential as boundary condition. However, such a method is neither Hamiltonian 
nor time symmetric, potentially resulting in artificial secular evolution (and violation of energy conservation). 

3.5.3 Basis function methods 

The i dea o f basis-function methods, pioneered by Clutton-Brock |l49l | in 1972 and later dubbed 'self-consistent field 
code' [l50j . is to expand the mass density into basis functions p n {x) with coefficients C„ 



n 



(63) 



22 



Walter Dehnen, Justin I. Read: iV-body simulations of gravitational dynamics 



where n = (n,l, m) is a three-dimensional set of indices, and estimate the potential as 

S(x) = -GY / C n Mx), (64) 

n 

where 4irp„(x) = — V 2 tjj n (x). Usually, the sets of basis functions used are complete and bi-orthogonal, i.e. satisfy 

<W = Jdxp„(x)ip„'(x), (65) 
S(x - x') = ^2p n (x) p n (x'), (66) 

n 

^ = E*»WW4 (67) 



\x — x' 



where the integral is over all space and the sums include all terms |n| — > oo. Applying the bi-orthogonality relation 
(|r?5|) to the density estimate (|6"5)l gives C„ — Jdx p(x) ip„(x), which upon inserting of the Monte-Carlo estimator for 
the mass density (equation 001 with g a delta-spike) yields 

C„ = y^p,iip„(xi). (68) 



Thus, the basis-function force solver first calculates the coefficients C„ via equation (|68[) and subsequently computes 
potential and force at any position via equation (|64p and its derivative. Since both of these operations are trivially split 
between multiple processes and require minimal communication, this method is ideal for computational parallelism. 

Of course, in practice one has to truncate the expansion at some finite order n max . Ideally, the error made by this 
truncation is small, such that the signal in the neglected coefficients C n >n max is negligible. This is usually achieved by 
choosing the parameters of a given basis-function set such that po already closely matches the system modelled and 
higher-order terms merel y describe deviations. A more systematic approach is to design the basis functions to match 
the system at hand [I5l| but also to truncate the expansion smoothly according to the estimated signal-to-noise in 
the neglected coefficients jl52j |. 



Critique Of course, the basis-function approach is not suitable for modelling wildly dynamic situations, such as 
galaxy mergers, but only for near-equilibrium dynamics, when the stellar system deviates only slightly from the 
smooth zeroth-order basis function. An advantage of the method is the effective softening, which usually is small in 
high-density regions where the the basis functions vary mostly. Unlike Greens-function softening, the method does not 
bias the force of a density cusp (p oc r~ 7 ) if the basis functions contain the same power-law cusp and the A^-body 
system is centred on the origin of the expansion. In fact, off-centring is a significant problem with this method. 

It has been argued that the basis-function method is ideally suited for such near-equilibrium systems, because the 
computational cost scales linear with N . However, this view is too simplistic, since it is futile to increase N but not 
also the force resolution, i.e. n max . Simple arguments based on equivalence to Greens-function softening suggest that 
the optimum resolution for given N requires an incre ase of n max such that the overall computational cost scale like 
0(N 2 ), as for straightforward direct summation |l3l| . 

One may hope to alleviate this problem by a continually adapting the lowest-order basis function [l5ll | such that 
"max can be kept low. However, this approach changes the approximated Greens function 



x-x') = *y] ip„(x) ip„(x') 



(69) 



during the simulation and therefore introduces an artificial time dependence into the approximate Hamiltonian. At 
best, this only destroys energy conservation, but more likely has other adverse effects which are less obvious to identify. 

The basis-function method may be most useful for simulations with constrained symmetry. For example, enforcing 
spherical symmetry simply amounts to setting C n i m oc 5io5 m a when using a basis based on spherical-harmonics. 
This reduces the computational costs substantially, even when including many radial basis functions, thus enabling 
extremely fast spherically symmetric simulations with large N. The basis-function method is certainly very useful as 
force solver for other purposes, for example to approximate the density and/or potential of an external system or to 
model the potential in perturbation analyses. 
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3.5.4 Hybrid methods 

The advantages and disadvantages of the various force solvers naturally lead to the concept of hybrid methods, to 
avoid the respective disadvantages. Most relevant in this context is presumably the usage of FFT-based methods to 
obtain periodic boundary conditions, which are desired in cosmological simulations. The disadvantage of the FFT- 
based method (also called 'PM' particle-mesh) is the lack of resolution and adaptivity on small scales. In the 1980s, the 
early days of cosmological simulations, the combination of PM with direct summation ('PP': particle-particle) for the 
computation of the near-neighbour force deficit (the differe nce b etween the average FFT force and that for the desired 
softening length) enjoyed some popularity as 'P 3 M' codes [l53j . Later, this approach has been improved by replacing 
direct summation with a tree code ('TreePM' |l54j ). but we are not aware of the obvious FMM-PM combination. 
Combination s of the multi-grid method with the FFT are straightforward (the FFT is used as force solver on the 
coarsest grid |l55j ). 

Other hybrid methods combine the basis-function approach in a su bset of th e spatial dimensions with a grid in 
the remaining, for example using spherical harmonics with a radial grid [l56l Il57| . 



3.6 Recent numerical developments and challenges 

Collisionless iV-body simulations have benefitted enormously from the incredible increase in computer power, in 
particular as the computational costs only increase like A In A (for the tree code and grid methods). This combined 
with massive parallelisation on many thousand cores has allowed very large A simulations. A further increase by a 
factor ~ 10 should be possible just by using the FMM as force solver (though implementing the FMM efficiently 
in parallel is challenging). Another welcome numerical advancement would be better time-stepping methods (see the 
discussion at the end of H2.3.3p . 

However, the main challenge of contemporary applications of collisionless A-body methods lie not in the method, 
but in the astrophysic s it m isses out: any non-gravitational interaction. Baryonic matter contributes only 16% of all 
matter on large scales [I16j and inter-stellar gas contributes only little to the mass of individual galaxies (in the Milky 
Way the mass ratio between gas and stars is about 1:9). However, the gas not only interacts gravitationally (with all 
other matter), but can directly dissipate (and absorb) energy in form of radiation, and therefore behaves fundamentally 
differently from point-mass particles. 

One big challenge of contemporary astrophysics is to model the formation of galaxies ab initio. To this end, many 
complex astrophysical processes and phenomena must be modelled, such as the multi-phase nature of the gas, its 
condensation to stars and active galactic nuclei (AGN), and their feedback (via winds and radiation) onto the gas 
(all these process may be summarised as 'baryon physics'). While hydrodynamics is comparatively straightforward 
to add to the method, e.g. via smoothed particle hydrodynamics |134| . most of the baryonic physics is not. This 
is because these processes are themselves not properly understood and can only be modelled via parametric 'sub- 
resolution' models. For example, a gas particle is turned into a star particle according to some parametric model of 
our current understanding of the star formation process. While substantial progress has been made, the challenges 
are still formidable, not only because of our limited astrophysical understanding of the baryon physics, but also since 
much higher numerical resolution may be required for convergence than with simple gravity-only TV-body experiments. 



3.7 Past, recent, and future astrophysical modelling 

As in M2.71 we provide a very brief summary of selected (by personal opinion) highlights of astrophysical results based 
on collisionless A-body simulations (apology to any body who fee ls misse d-out ). 

One of the earliest results was the modelling by I (author?)! in 1972 jl58l . see also H, Il59lp *l of tidal interactions 
of rotationally supported disc galaxies, demonstrating a large variety of observed phenomena, such as tidal arms and 
bridges. Simulations o f galaxy interactions could explain many other observed phenomena such as shells, ripples, and 
other 'fine structure' 160Ml63| |. as well a s ring ga laxies [l64j . Also, the merger origin of elliptical galaxies received 



strong support from A-body simulations [165l ll( 

The bar instability of isolated disc galaxies was discovered in the first simulations of such systems [1671 . I16S 
providing a natural explanation of the high frequency of ba rred galaxies. More recent simulations suggest a close 
connection between the dynamics of bars and outer rings [l69j | and demonstrate the importance of a dark-matter halo 
as an angular-momentum absorber [l70l Il7l| . The buckling instability of bars was first seen in A-body simulations 
[l72j . though the identification of 'boxy' or 'peanut-shaped' bulges as side-on bars was only made later. Simulations of 



18 Sadly, this excellent work is only little-known (63 citations), largely because publishing in the west was extremely difficult 
for Soviet scientists during the cold-war era. The later study by Toomre & Toomre (ApJ 178, 623) with essentially the same 
simulation set-up but only N = 120 (compared to N = 2000) is much more widely known (over 1700 citations). 
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disc galaxies also showed spontaneous formation of transient an d/or long-lived spiral features [l73j j , which prompted 
the theoretical model of swing-amplification by Toomre & Zang [l74j . 

'Cosmological' simulations based on cold dark-matter match the observed large-scale structure of the universe 
[I75lll76j . while hot (e.g. neutrino) da rk-m atter is ruled out [l77j |. The dark-matter haloes formed in such simulations 
possess a universal dens ity p rofile pil Il78j . which varies between p oc r" 1 at r — > and p oc r ~ 3 to r~ 4 at large radii, 
and triaxial shapes [U Il79l | . This is now observationally confirmed on galaxy cluster scales [l80l -fi82:] . Finally, such 
simulations predict a weal th of subs tructure that should be present orbiting the Milky Way, and in the Local Group 
that is not (yet) observed [HI Ell. 

Confronting such predictions, as well as those of almost all applications of collisionless A-body simulations, with 
observations requires an extension of the method to include baryonic physics (see £)3.6p . The goal for the coming 
decades will be realistic simulations of galaxy formation and evolution, which will allow us to test both our current 
cosmological model, and galaxy formation theories. 



4 Validation 



A very important issue with A-body techniques is the validation of the numerical method in general and for the specific 
purpose in particular. While a 100% validation is never possible, there are several powerful validation techniques 
available. In particular for collisionless A-body techniques, every project requires a validation to ensure that the 
numerical method (which includes the values of numerical parameters such as A) guarantees the desired numerical 
accuracy, thus avoiding systematic errors driven by numerical artifacts. 

Validation of the method Before even considering the application of a certain TV-body program (collisional or colli- 
sionless) for a particular modelling purpose, the values of numerical parameters (controlling time step, force approx- 
imation, etc.) required for correctness must be established. This can be done in two ways: first by simulating simple 
but non-trivial situations for which the correct outcome is known from other means (such a perturbation analysis or, 
in the case of collisional A-body code by using any of the alternative methods of £|2.6[) ; and second (for collisionless 
TV-body methods) by re-simulating a certain situation with ever better numerical accuracy, which should converge to 
a (hopefully correct) answer. 

Validation by Monitoring One of the most straightforward validation requirements is the actual numerical conserva- 
tion of total energy, momentum, and angular momentum. While with a finite time step energy conservation can never 
be fully achieved, a relative energy conservation to 3-4 digits is usually considered sufficient in practice for collisionless 
A-body techniques, while for collisional methods 5-6 digits appear to be the norm. A systematic trend of energy hints 
at an artificial secular evolution and is much more problematic than mere fluctuations. The same applies to momentum 
and angular momentum. 

In order to monitor whether gravitational softening introduces significant bias (if for instance many particles overlap 
within the softening length), one can compare the virial W and the potential energy V, defined as 

W = ^/Xi Xi ■ V & se lt(Xi) + ^cxt(^) , V = [itself (aJi) + ^extfci) (70) 



with <P SC \{ and <? C xt denoting, respectively, the (approximated and softened) A-body potential and any external po- 
tential. For pure Newtonian gravity (unsoftened) , one has W = V and deviations from this equality are indicative of 
the global importance of the gravity reduction due to softening. 

Validation by simulation When the effect of some particular perturbation is to be modelled, for example of an infalling 
satellite, it is important to ensure that, at the resolution at which the simulation results will be interpreted, a control 
simulation without the perturbation behaves as expected. Sometimes, this may not be sufficient, for example when 
the perturber gives rise to additional astrophysical effects not present in and hence not scrutinised by the control 
simulation. In such a situation one must employ convergence tests: an equivalent simulation at significantly higher 
resolution should give consistent results. Even this may fail when the simulations converge to the wrong answer. 

Validation by comparison The only way to to shield against this problem is to re-simulate with a completely dif- 
ferent A-body method. For example, various A-body tests have been done within the ESF-funded astro-sim project 
(http://www.astrosim.net/code). With collisionless A-body methods this is often the only reliable validation tech- 
nique, since there are hardly any non-linear problems with solutions known from other methods (against which one 
could compare). Such proje cts a re important and have demonstrated that cosmological codes now agree at the 10% 
level for important metrics [185| . This is excellent progress, but not good enough for next generation cosmological 
probes like Euclid (http://sci.esa.int/euclid). 
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Validation and interpretation Simulation results, in particular when unexpected, should never be trusted at face 
value, but an effort must be made to understand them in astrophysical terms. This is very often non-trivial, because 
the highly non-linear and complex dynamics is not compatible with any simplifying assumptions that would allow 
insight by means of approximate analytical models. However, unlike the situation with real stellar systems and galaxies, 
the iV-body model can be analysed ("observed") in full six-dimensional phase-space allowing a much better handle 
on the dynamical processes than nature itself. Contemporary analysis techniques for A^-body data are still somewhat 
immature, but their discussion beyond the scope of this paper. 



5 Conclusion 

We have reviewed some of the latest numerical techniques for modelling collisional and collisionless iV-body systems. 
Our focus throughout has been on purely gravitational A^-body simulations, with a view to presenting the key numerical 
algorithms. Over the past ~ 50 years of iV-body calculations, the held has undergone dramatic changes. Improved 
software algorithms, specialised hardware, and efficient parallel programming have meant that N has kept pace with 
Moore's Law, nearly doubling every two years. This has allowed us to simulate the most massive star clusters, galaxies, 
and the Universe as a whole with increasing precision. 

It is clear that our modelling of gravity is in good shape. Different codes and techniques give converged results, 
while both collisional and collisionless simulations continue to push to larger N. However, the coming challenge will 
be building believable models of baryonic processes in the Universe. This is beyond the scope of this short review, but 
is the key challenge for future iV-body simulations on all scales. 
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